154 research outputs found
On Equivalence and Cores for Incomplete Databases in Open and Closed Worlds
Data exchange heavily relies on the notion of incomplete database instances. Several semantics for such instances have been proposed and include open (OWA), closed (CWA), and open-closed (OCWA) world. For all these semantics important questions are: whether one incomplete instance semantically implies another; when two are semantically equivalent; and whether a smaller or smallest semantically equivalent instance exists. For OWA and CWA these questions are fully answered. For several variants of OCWA, however, they remain open. In this work we adress these questions for Closed Powerset semantics and the OCWA semantics of [Leonid Libkin and Cristina Sirangelo, 2011]. We define a new OCWA semantics, called OCWA*, in terms of homomorphic covers that subsumes both semantics, and characterize semantic implication and equivalence in terms of such covers. This characterization yields a guess-and-check algorithm to decide equivalence, and shows that the problem is NP-complete. For the minimization problem we show that for several common notions of minimality there is in general no unique minimal equivalent instance for Closed Powerset semantics, and consequently not for the more expressive OCWA* either. However, for Closed Powerset semantics we show that one can find, for any incomplete database, a unique finite set of its subinstances which are subinstances (up to renaming of nulls) of all instances semantically equivalent to the original incomplete one. We study properties of this set, and extend the analysis to OCWA*
Answering Queries using Views over Probabilistic XML: Complexity and Tractability
We study the complexity of query answering using views in a probabilistic XML
setting, identifying large classes of XPath queries -- with child and
descendant navigation and predicates -- for which there are efficient (PTime)
algorithms. We consider this problem under the two possible semantics for XML
query results: with persistent node identifiers and in their absence.
Accordingly, we consider rewritings that can exploit a single view, by means of
compensation, and rewritings that can use multiple views, by means of
intersection. Since in a probabilistic setting queries return answers with
probabilities, the problem of rewriting goes beyond the classic one of
retrieving XML answers from views. For both semantics of XML queries, we show
that, even when XML answers can be retrieved from views, their probabilities
may not be computable. For rewritings that use only compensation, we describe a
PTime decision procedure, based on easily verifiable criteria that distinguish
between the feasible cases -- when probabilistic XML results are computable --
and the unfeasible ones. For rewritings that can use multiple views, with
compensation and intersection, we identify the most permissive conditions that
make probabilistic rewriting feasible, and we describe an algorithm that is
sound in general, and becomes complete under fairly permissive restrictions,
running in PTime modulo worst-case exponential time equivalence tests. This is
the best we can hope for since intersection makes query equivalence intractable
already over deterministic data. Our algorithm runs in PTime whenever
deterministic rewritings can be found in PTime.Comment: VLDB201
Increasing environmental compatibility of metal production
Building materials production generates a large amount of harmful substances poisoning the atmosphere. One of the major sources polluting cities environment is metallurgical industry. Concentration is one of the most important processes where empty components are extracted from the rock. During ore concentration, an increasing number of man-made wastes are generated; they pollute the air and huge area around the factories discharging these wastes. This reduces both space for people to live and place for cities to function and develop. It should be noted that metal production enterprises have accumulated billions of tons of industrial wastes (tailings) that include a large amount of iron-containing materials and rocks; these can be used as building materials, for example, when preparing fine-grained concrete as a mineral powder as well as in construction of roads, houses, in paint production, etc
Towards Ontology Reshaping for KG Generation with User-in-the-Loop: Applied to Bosch Welding
Knowledge graphs (KG) are used in a wide range of applications. The
automation of KG generation is very desired due to the data volume and variety
in industries. One important approach of KG generation is to map the raw data
to a given KG schema, namely a domain ontology, and construct the entities and
properties according to the ontology. However, the automatic generation of such
ontology is demanding and existing solutions are often not satisfactory. An
important challenge is a trade-off between two principles of ontology
engineering: knowledge-orientation and data-orientation. The former one
prescribes that an ontology should model the general knowledge of a domain,
while the latter one emphasises on reflecting the data specificities to ensure
good usability. We address this challenge by our method of ontology reshaping,
which automates the process of converting a given domain ontology to a smaller
ontology that serves as the KG schema. The domain ontology can be designed to
be knowledge-oriented and the KG schema covers the data specificities. In
addition, our approach allows the option of including user preferences in the
loop. We demonstrate our on-going research on ontology reshaping and present an
evaluation using real industrial data, with promising results
GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner
Graph self-supervised learning (SSL), including contrastive and generative
approaches, offers great potential to address the fundamental challenge of
label scarcity in real-world graph data. Among both sets of graph SSL
techniques, the masked graph autoencoders (e.g., GraphMAE)--one type of
generative method--have recently produced promising results. The idea behind
this is to reconstruct the node features (or structures)--that are randomly
masked from the input--with the autoencoder architecture. However, the
performance of masked feature reconstruction naturally relies on the
discriminability of the input features and is usually vulnerable to disturbance
in the features. In this paper, we present a masked self-supervised learning
framework GraphMAE2 with the goal of overcoming this issue. The idea is to
impose regularization on feature reconstruction for graph SSL. Specifically, we
design the strategies of multi-view random re-mask decoding and latent
representation prediction to regularize the feature reconstruction. The
multi-view random re-mask decoding is to introduce randomness into
reconstruction in the feature space, while the latent representation prediction
is to enforce the reconstruction in the embedding space. Extensive experiments
show that GraphMAE2 can consistently generate top results on various public
datasets, including at least 2.45% improvements over state-of-the-art baselines
on ogbn-Papers100M with 111M nodes and 1.6B edges.Comment: Accepted to WWW'2
- …